University of Delaware Department of Electrical and Computer Engineering Computer Architecture and Parallel Systems Laboratory Diamond Tiling: A Tiling Framework for Time-iterated Scientific Applications
نویسندگان
چکیده
This paper fully develops Diamond Tiling, a technique to partition the computations of stencil applications such as FDTD. The Diamond Tiling technique is the result of optimizing the amount of useful computations that can be executed when a region of memory is loaded to the local memory of a multiprocessor chip. Diamond Tiling contributes to the state of the art on time tiling techniques in that it merges the following characteristics: (1) it optimally reuses the amount of computations that can be executed per region of memory loaded, (2) this optimization for locality is done regardless of code structure, (stencil computations with any loop structure can be optimized), the data dependencies between the computations are used to partition the program instructions, (3) the program partitions (tiles) resulting from applying Diamond Tiling are fully parallel without the need to execute redundant computations (4) code generation is simple, and it can be easily incorporated in an optimizing compiler and (5) the technique presented here is applicable to N dimensional stencil computations. Experimental evidence to support our claims is gathered using FDTD, a commonly used stencil application running on the recently developed Cyclops-64 processor. The results obtained show that stencil applications using Diamond Tiling have a lower running time and total number of off-chip memory operations than other state of the art tiling techniques.
منابع مشابه
University of Delaware Department of Electrical and Computer Engineering Computer Architecture and Parallel Systems Laboratory Tile Reduction: an OpenMP Extension for Tile Aware Parallelization
Tiling is widely used by compilers and programmer to optimize scientific and engineering code for better performance. Many parallel programming languages support tile/tiling directly through first-class language constructs or library routines. However, the current OpenMP programming language is tile oblivious, although it is the de facto standard for writing parallel programs on shared memory s...
متن کاملDrug Discovery Acceleration Using Digital Microfluidic Biochip Architecture and Computer-aided-design Flow
A Digital Microfluidic Biochip (DMFB) offers a promising platform for medical diagnostics, DNA sequencing, Polymerase Chain Reaction (PCR), and drug discovery and development. Conventional Drug discovery procedures require timely and costly manned experiments with a high degree of human errors with no guarantee of success. On the other hand, DMFB can be a great solution for miniaturization, int...
متن کاملDual Space Control of a Deployable Cable Driven Robot: Wave Based Approach
Known for their lower costs and numerous applications, cable robots are an attractive research field in robotic community. However, considering the fact that they require an accurate installation procedure and calibration routine, they have not yet found their true place in real-world applications. This paper aims to propose a new controller strategy that requires no meticulous calibration and ...
متن کاملHand Gesture Recognition from RGB-D Data using 2D and 3D Convolutional Neural Networks: a comparative study
Despite considerable enhances in recognizing hand gestures from still images, there are still many challenges in the classification of hand gestures in videos. The latter comes with more challenges, including higher computational complexity and arduous task of representing temporal features. Hand movement dynamics, represented by temporal features, have to be extracted by analyzing the total fr...
متن کاملHigh Speed Delay-Locked Loop for Multiple Clock Phase Generation
In this paper, a high speed delay-locked loop (DLL) architecture ispresented which can be employed in high frequency applications. In order to design the new architecture, a new mixed structure is presented for phase detector (PD) and charge pump (CP) which canbe triggered by double edges of the input signals. In addition, the blind zone is removed due to the elimination of reset signal. Theref...
متن کامل